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Method and kit for determining human geographic or population origin 

Field of invention 

The present invention is within the fields of human origins, medicine and 
evolutionary biology. More precisely, the invention relates to a method and kit for 
determining the geographic or population origin of a human based on mitochondrial 
DNA sequences and specifically to the use of mitochondrial DNA variants 
(polymorphisms) from the complete human mitochondrial genome. This information 
is employed in the comparison of biological samples with samples of known origin 
or with a database of mitochondrial genome sequences. 

Background of the invention 

Each individual, with the exception of identical twins, has a unique genetic 
constitution. Therefore, identification of the genetic origin can, in principle, be 
based on a comparison of the genetic material between e.g. a sample of unknown 
human origin and reference samples of known origin. Such an analysis is of 
relevance in maternity and paternity investigations, in immigration cases where the 
familial relationships is disputed, in medical and evolutionary biology research. 

The genetic material (DNA) in humans is found in the cell nucleus (containing over 
99% of the material) and the mitochondrion (with less than 1%). The DNA in the 
nucleus is inherited with 50% from each parent, while the DNA in the 
mitochondrion (called the mtDNA) is derived solely from the mother (a process 
denoted maternal inheritance). Several characteristics of the mitochondrial genome 
makes it different from that in the nucleus. The high mtDNA copy number per cell 
(1,000 - 10,000 copies/ cell) allows for the analysis of materials with limited 
amounts of, or partly degraded, DNA (Bodenhagen and Clayton, 1974). The higher 
nucleotide substitution rate of mtDNA relative to most nuclear genes also increases 
the potential for individual identification (Brown et al. 1982). Finally, mtDNA is 
inherited uniparentally, through the maternal parent (Hutchinson et al. 1974). 
Since the mtDNA of siblings and close maternal relatives are expected to be 
identical, with the exception of new mutations, individuals may be assigned to a 
maternal lineage. Due to the haploid, or clonal nature, of mtDNA, jumping-PCR 
(Paabo et al. 1989), which commonly occurs in degraded DNA, does not cause 
erroneous results. By contrast, in analysis of nuclear loci, jumping PGR may result 
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in the presence of chimeric sequences that complicate the interpretation of the 
allelic phase of polymorphisms. 

Studies of human origins have been based on analyses of small segments of either 
nuclear DNA or mtDNA. The studies of mtDNA have been confined to a small 
segment of the mtDNA genome, denoted the D-loop (Higuchi et al. 1988, Wilson et 
al. 1993, Ginther et al. 1992, Hagerberg and Sykes 1989, Stoneking et al. 1991, 
Holland et al. 1993, Handt et al. 1994, Gill et al. 1994, Allen et al. 1998). No studies 
including the entire mtDNA for the purpose of determining the maternal lineage and 
deducing the geographic or population origin have been described. Analysis of the 
D-loop region have been complicated by the extreme variation in substitution rate 
between different nucleotide sites (necessitating the exclusion of some sites from 
the analysis) and the relatively high rate of new mutations (potentially resulting in 
false inclusions/ exclusions). Different tissues of an individual may also show 
differences in the D-loop mtDNA, rendering comparisons between samples from 
maternal relatives but of different tissue type ambiguous. 

Studies have been performed to estimate probability by which the major ethnic 
group (or geographic region) of an individual can be estimated based on D4oop 
sequences (Allen et al. 1998). Using D-loop sequences alone is it impossibile to 
identify the geographic or population origin of an individual. 

Studies of the human mitochondrial molecule have also been carried out through 
RFLP analysis, providing data on some DNA variants outside the D-loop. These 
analyses were carried out to study the evolutionary history of the human species, or 
to identify mutations causing mitochondrial disease and were not performed for the 
purpose of determining the deducing the geographic and population origin of an 
individual. 

Summary of the invention 

The present invention provides a method for determining the geographic and 
population origin of an individual based on analysis of biological samples. 
According to the present invention the analysis is based on an analysis of the entire 
mitochondrial DNA (mtDNA) and comparison of the sample under investigation with 
that of known origin or with a database of compete mitochondrial genome 
sequences. 
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In a first aspect the present invention relates to a method for determining the 
geographic and population origin of a human comprising the following steps: 

a) determining the complete nucleic acid sequence of the mitochondrial genome 
including polymorphic sites in a sample from a human subject who's origin or 
identity is studied; and 

b) relating the information from step a) to mitochondrial nucleic acid sequence 
information of known origin. 

The body sample referred to above can be derived from body fluid or tissue. 

The known information in step b) may be derived from human subjects of known 
identity (reference subjects). Alternatively, the known information in step b) is 
derived from a database of the nucleic acid sequence information the complete 
mitochondrial genome of humans of diverse origin. This may be appropriate in 
cases where samples from maternal relatives are not available, or when otherwise 
considered informative. 

The mitochondrial DNA sequence information is from the entire mtDNA molecule 
(about 16.500 nucleotides). Alternatively, the information is obtained from a 
fragment hereof, not including the D-loop. The analysis of the complete mtDNA 
sequence or fragments thereof, may be carried out using existing technology for 
DNA sequencing. Analysis of the genetic markers may be based on DNA 
hybridisation assays (such as ASO hybridisation, DNA microchip, padlock), 
enzymatic cleavage assays (OLA, Taqman), enzymatic extension assays 
(minisequencing, pyro sequencing) or other suitable techniques for DNA typing. 

In a second aspect, the present invention provides a kit for determining the 
geographic and population origin of a human, comprising means for sequence 
analysis of the complete human mitochondrial DNA of unknown origin and 
mitochondrial DNA sequence information of known origin. This sequence 
information can be provided as a leaflet with printed sequence information or as a 
printed reference to a database. The means for sequence analysis can be any 
means used in the known DNA sequencing procedures described above. The 
sequence of unknown origin is compared with the known sequences or a database 
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with complete mitochondrial genomes and conclusions about geographic and 
population origin can be made. 

In an alternative embodiment, the kit comprises means for analysis of polymorphic 
sites, preferably located outside the D-loop. Polymorphic sites useful for the 
purposes of the present invention are listed in Table 1. The D-loop is numbered 
from 16028 to 577 (i.e. 16028 to 16569 and 1-577). 

In an alternative emboiment, the present invention provides a specific set of 
laboratory reagents and conditions (protocol) for determining the type of nucleotide 
at a selected set of the polymorphic sites in the mitochondrial genome, based on 
using the pyrosequ encing method. 

Detailed description of the invention 

The invention will now be described more closely in association with the 
accompanying drawings, in which 

Figure 1 represents data matrices showing all informative nucleotide positions in 53 
of the 124 individuals for which the complete mitochondrial genome has been 
determined, in decreasing order of frequency, in: a) the whole mtDNA genome, 
excluding the D-loop and; b) the D-loop. The trees on the left are cladograms with 
the same topology and numbering of individuals as the tree in Figure 2. Individuals 
of African decent are found exclusively below the dashed line and non-Africans 
above. The four major groups of sequences are boxed in blocks. The blocks denote 
groups of nucleotides identical in several sequences. 

Figure 2 shows a Neighbor-Joining phylogram based on complete mtDNA genome 
sequences (but excluding the D-loop) of 53 of the 124 individuals examined, 
constructed using PAUP*4.0 Beta (Sinauer Associates) and bootstrapped with 1000 
replicates (bootstrap values shown on nodes). The population origin of the 
individual is given at the twigs. The branches are shown block- wise as in Fig 1. 
Individuals of African descent are found exclusively below the dashed line and non- 
Africans above. The node marked refers to the MRCA of the youngest clade 
containing both African and non-African individuals. 
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Figure 3 shows mismatch distributions of pairwise nucleotide differences between 
53 mtDNA genomes (excluding the D-loop) for: a) African and; b) non-African 
individual. 

Figure 4 shows distribution in the number of differences in pairwise comparisons of 
53 sequences for the D-loop only and for the complete mtDNA sequence. 

Identification of genetic variation in the human mitochondrial genome 
To assess the global genetic diversity in human mtDNA we have determined the 
complete mtDNA genome of a total of 124 individuals. Individuals were selected as 
to cover most of the genetic diversity in the human species. 

The complete mtDNA genome was amplified in segments using PCR and the 
fragments sequenced. The primers used for PCR amplification were as described by 
Reider et al (1998). Sequencing was performed on the PCR products directly using 
BigDye (Applied Biosystems) chemistry. Separation of sequencing ladders was 
performed on the ABI 377 instrument for automated fragment analysis. Both 
forward and reverse strands were sequenced. Sequence analysis was performed 
using Sequencing Analysis 3.3 (Applied Biosystems) and sequence alignment was 
made with Sequencher 3.1.1 (Gene Codes). 

All the 124 complete mtDNA sequences are unique. A total of 1122 polymorphisms 
were identified among these 124 individuals. A list of the sites, the types of 
alternative nucleotides found at each of these sites, and the frequency of these 
different nucleotides is shown in Table. 1. 

Our study of the entire mitochondrial genome has significant distinctions from 
previous studies of the D-loop. Most importantly, the sequences outside of the D- 
loop evolve in an approximately 'clock-like' manner, enabling a more accurate 
measure of mutation rate, and therefore improved estimates of tim es to 
evolutionary events (Ingman et al. 2000). The difference between the D-loop and the 
remaining molecule is visually evident in the contrast between the jumbled 
arrangement of polymorphic sites in the D-loop and the clear haplotypes defined by 
the sites in the rest of the molecule (Fig. 1). The Neighbor- Joining tree constructed 
from our mtDNA sequences has a strongly supported basal branching pattern (Fig. 
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2). The three deepest branches lead exclusively to sub-Saharan mtDNAs, with the 
fourth branch containing both Africans and non-Africans. The deepest, statistically 
supported branch (NJ bootstrap = 100) provides compelling evidence of a human 
mtDNA origin in Africa. Our data shows that the genetic variation (polymorphism) 
in the mtDNA genome, outside the D-loop is subject to very little parallellism (i.e. 
very few independent substitutions in multiple mtDNA lineages at the same site), 
increasing the ability to determine the maternal lineage and deducing the 
geographic and population origin based on this information, while the D-loop has 
too many parallel substitutions at numerous sites to be useful for this purpose. 

Utility of complete mtDNA genome variation for studies of human origin 

Our analysis of the complete mitochondrial DNA genome (excluding the D-loop) 

have resulted in the following: 

1) A very large number of novel polymorphisms outside the D-loop have been 
detected, many of which show a strong population-specific distribution. This 
increases the ability to determine the specific maternal lineage for an individual 
and define the geographic and population origin of an individual based on a 
biological sample. 

2) A large database of complete mitochondrial genomes against which the genetic 
information from an individual can be compared, and geographic and population 
origin can be evaluated. For example, our database gives us an ability to identify 
with a very high probability whether the sample is obtained from an individual 
of African or Caucasian origin. 

Kit for the analysis of a subset of mtDNA polymorphisms detected 
The variation in mtDNA between individuals has been studied mainly using Sanger 
sequencing. Pyrosequencing is in comparison to many other techniques for genetic 
typing very quick, robust and easy to use (Ronaghi et al. 1996). 

The kit developed for mtDNA determination in the present invention is based on the 
analysis of 10 PGR fragments covering highly informative sites in the entire 
mitochondrial genome (Table 2). The system is based on the analysis of 19 
pyrosequencing reactions, including 4 HVI, 4 HVII and 1 1 coding region reactions. 
This allows the analysis of some of the most informative regions of the entire 
mtDNA. The method enables the analysis of the D-loop and the coding regions of 
the mtDNA and can used on a wide range of biological materials. 



6 



WO 02/22873 



PCT/SE01/01691 



The analysis of the mitochondrial D-loop is performed from two separate PCR 
fragments. One hypervariable region I (HVI) fragment, which is analyzed in four 
separate pyro sequencing reactions. Similarly, the hypervariable region II (HVII) 
fragment is analyzed in four separate pyrosequencing reactions. In order to evaluate 
the technique, mitochondrial DNA (mtDNA) from a number of control samples have 
been analyzed. In total, 190 samples were analyzed for HVII, 120 samples were 
analyzed for HVI and finally 50 forensic forensic evidence materials were studied. 
The results were identical with sequencing data of the D-loop for the same 
individuals. 

Pyrosequencing was further used to sequence 11 mtDNA coding regions in 36, 
previously sequenced, control samples. The 11 regions are chosen to cover the most 
informative sites throughout the entire mitochondrial genome using the previously 
generated complete mtDNA genome sequences. The results obtained using 
pyrosequencing were 100% identical to those obtained by the Sanger sequencing 
method. 

To evaluate the system on limited amounts of DNA 50 biological samples with small 
amounts of DNA were analyzed for HVI and HVII by pyrosequencing. Among the 
materials analysed were samples from robber hoods, wigs, moustaches, shoes, 
cellular phones, watches, knives and guns. The pyrosequencing results were, for all 
the tested samples, identical to the results obtained using Sanger sequencing. 

Combining the results of the eight HVI and HVII pyrosequencing reactions cover 
altogether approximately 88% (396/448) of the nucleotides after manual editing of 
the D-loop sequences. Complete Sanger sequencing of the D-loop gives a HVII 
fragment of 359 nucleotides and a HVI fragment of 403 nucleotides using the 
primer pairs L048/H408 and L15997/R16401 (Wilson et al. 1995). The 
pyrosequencing method covers 62% (222/359) of the nucleotides of the HVII 
fragment determined by Sanger sequencing and 56% (226/403) of the nucleotides 
of the HVI fragment determined by Sanger sequencing. In total, this gives a 59% 
coverage (448/762) of the D-loop determined by Sanger sequencing. 
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EXPERIMENTAL SECTION 

Methodology for determining the complete mtDNA genome 

The PCR 

The mitochondrial genome is amplified in 24 overlapping fragments by the PCR 
technique, using 24 primer pairs, as described by Reider et al (1998). Both the 
forward and reverse strands of these 24 DNA templates are sequenced using 
established techniques for DNA sequencing such as theBigDye Primer Cycle 
Sequencing Ready Reaction (Applied Biosystems) kits diluted 1:1 with lx 
sequencing buffer, using the following components 
luL of DNA template 

4uL reaction mix (2uL kit and 2uL lx sequencing buffer) 

Extension reactions in MJ Research Tetrad therm o cycler with programs as follows: 

Forward reactions 

(ramping at 1°/ second) 

96° 10 seconds 

55° 5 seconds 

70° 1 minute 

<14 cycles> 



96° 10 seconds 
70° 1 minute 
<14 cycles> 



Hold at 4° 

Reverse Reactions 
(ramping at 1°/ second) 
96° 10 seconds 
48° 5 seconds 
70° 1 minute 
<19 cycles> 



96° 10 seconds 
70° 1 minute 
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<14 cycles> 



Hold at 4° 

The extension products from these reactions are precipitated in 95% ethanol and 
the products spun down in a centrifuge at maximum speed for 25 minutes before 
removing the ethanol and allowing the pellets to air dry. 

The pellets are rehydrated in a bromothymol loading buffer and loaded on an ABI 
377 instrument for separation of sequencing ladders. A 96 well format was used 
allowing all the forward and reverse strands of 2 genomes to be run simultaneously. 
Sequencing gel analysis was performed using Sequencing Analysis 3.3 (Applied 
Biosystems) and the contigs were assembled into complete double stranded 
genomes with Sequencher 3.1.1 (Gene Codes). To aid in error checking, these 
completed genomes were then aligned with a consensus genome sequence of all the 
previously sequenced mtDNAs and all differences checked against the 
chromatograms for the new sequence. The new genome sequence could then be 
exported to the sequence database for analysis. 

Sequence Analysis 

The new complete mtDNA genome sequence is aligned with the other sequences in 
the reference database. Pairwise numbers of differences can be calculated against 
every sequence in the database, for instance using a computer program such as 
PAUP*. The resulting distance matrix is a list of the number of differences between 
the sequence under analysis and each reference sequence. From this information, 
the number of matches with 0, 1 & 2 differences between an individual sequence 
and those in the database can be can be calculated. In our database of 124 
complete mtDNA sequences, the number of variable sites when considering the D- 
loop sequences only is much lower than that for the complete sequence. 

Methodology and reagents for the kit assaying the mtDNA polymorphisms detected 

DNA preparations 

Human DNA was extracted from PBLs of 190 Swedish blood donors, to serve as 
control samples. The forensic evidence material was purified by one of three 
methods. The Wizard Genomic DNA Extraction Kit (Promega) was used to extract 
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DNA from most of the evidence material collected by cotton swabs and also from 
blood. Chelex 100 (Bio-Rad) was used to extract DNA from bloodstains. The method 
has been developed for extracting DNA from forensic-type samples for use with PCR 
(Walsh et al. 1991). Hair samples were extracted with an extraction procedure that 
uses proteinase K and DTT (Vigilant 1999). 

Primer design and PCR 

The software Primer Express® (Applied Biosystems) was used for PCR and sequence 
primer design. An optimal selection has been made to cover most of the informative 
polymorphism in the D-loop, where hypervariable region I and II (HVT and HVII) are 
PCR amplified in two different PCR products (Table 2). The coding region 
polymorphisms are amplified in 11 separate reactions and are analysed in 11 
pyrosequencing reactions, using the forward PCR primer as sequencing primer. 
PCR amplification for control samples was set up in 70 \xl containing 1.5 (al DNA, 
200 nM of a normal and a biotinylated primer (Table 2), 200 nM of each dNTP, 1.5 
mM MgCl 2 , 2 U AmpliTaq® Gold DNA Polymerase and lx GeneAmp® PCR Buffer IL 
The PCR amplification for the forensic evidence material was set up in 100 with 
10 jliI DNA, 200 nM of the normal and the biotinylated primer, 200 nM of each 
dNTP, 2.4 mM MgCl 2 , 10 U AmpliTaq® Gold DNA Polymerase, 1.2x GeneAmp® PCR 
Buffer II, 0.16 mg/ml BSA and 10 % glycerol. The amplifications were performed in 
an ABI 9600 instrument (Applied Biosystems). The samples were kept for 10 min at 
95°C followed by 45 cycles of 30 sec at 95°C, 45 sec at 60°C (53°C for the coding 
region primers) and 60 sec at 72°C. The final extension was lengthened to 7 min. 
Tubes that contained all PCR components, but without template (NTC), were used 
to ensure that the reagents were free of contamination. 

Template preparation and pyrosequencing reaction 

Streptavidin-coated beads were used as a solid phase support to obtain single- 
stranded biotinylated PCR products, as described by Pyrosequencing AB. Some 
reactions required 440 ng of Single Stranded DNA Binding Protein (SSB) 
(Amersham Pharmacia Biotech) added to the primed DNA template, prior to 
pyrosequencing (Ronaghi 2000). The sequencing was performed at room 
temperature and 15 \il of the 70 jol PCR reaction was used with 400 nM sequence 
primer. Enzyme and substrate mixture (prototype of PSQ™ 96 SQA Reagent Kit) 
were added to all samples in a PSQ™ 96 System (Pyrosequencing AB) with a 
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prototype of PSQ™ 96 SQA Software (sample entry, instrument control and 
evaluation). The procedure was carried out by stepwise elongation of the primer 
strand during sequential dispensation of different dNTPs (AaS, C, G and T) followed 
by degradation of nucleotides. Optimal cyclic dispensation orders were chosen for 
each fragment. The sequences were edited and compared with Anderson et at 
reference standard (Anderson et aL 1981). The control samples were sequenced 
using an ABI 377 instrument and BigDye Terminator chemistry. 
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Table 1. List of nucleotide sites that are polymorphic in the 124 human 
mitochondrial genomes examined together with the alternative nucleotides found at 
each site and the frequency of this polymorphism in the 124 individuals. Sites 
indicated by an asterisc (*)have been described previously in conjunction with 
studied of mitochondrial disease. Numbering according to Andersson et al. (1981). 
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Position 


Change 


No. 


43* 


C ins 


3 


59 


T-C 


1 


61 


C-T 


2 


62 


G-A 


2 


62 


G-T 


1 


63* 


T-C 


2 


64* 


C-T 


6 


66* 


G-A 


2 


70 


G-A 


1 


72* 


T-C 


2 


73* 


A-G 


108 


89* 


T-C 


6 


92 


G-A 


4 


93* 


A-G 


7 


95* 


A-C 


3 


103* 


G-A 


2 


114* 


C-T 


1 


125* 


T-C 


3 


127* 


T-C 


3 


128 


C-T 


3 


131 


T-C 


1 


143* 


G-A 


7 


146* 


T-C 


27 


150* 


C-T 


17 


151* 


C-T 


8 


152* 


T-C 


29 


153* 


A-G 


2 


182* 


C-T 


6 


183 


A-G 


2 


185* 


G-A 


2 


185* 


G-T 


1 


186* 


C-A 


6 


189* 


A-G 


6 


189* 


A-C 


6 


194* 


C-T 


4 


195* 


T-C 


26 


195 


T-A 


3 


198* 


C-T 


5 


199* 


T-C 


6 I 


200* 


A-G 


4 


204* 


T-C 


5 


207* 


G-A 


4 


208 


T-C 


2 


211 


A-G 


1 


212 


T-C 


5 


215* 


A-G 




217* 


T-C 




225* 


G-A 




227* 


A-G 




227 


A-T 





ZZO 


U-A 


2 


ZOO 


A-(o 


1 


zoo 




7 


z47 


Cs-A 


A o 

12 


z4o 


A del 


f 


z4y 


A-<o 


A 
1 


ZOO 


t* r* 

i -c 


A 
1 


252 


T-C 


1 


ZOO 


A /*"» 

A-G 


121 


ZY i 


U- I 


1 


Tor 
ZOO 


C- 1 


1 


zyu 


A A ri si\ 

aa del 


Z 


zyi 


A del 


1 


zyo 


O- I 


•4 
1 


O A~7* 

zy r 


A ^ — 4 

A-G 


6 


oUo 


C ins 


r*— 7 

57 


OUo 


CC ins 


15 


Q AO* 

oUo 


o del 


3 


ol 0 


T del 


2 


31 1* 


C ins 


121 


316* 


G-A 


6 


o1 7 


C-A 


1 


325* 


C-T 


1 


339 


A-G 


1 


Ot~7* 

357 


A-G 


1 


o / o 




1 


o / 4 


A-G 


1 


385* 


A-G 


2 


408 


~P A 

T-A 


1 


A CO 

4oo 


1 -U 


2 


>1 CO* 

4oo 


C-T 


4 


>1 CO* 

4oz 


C-T 


1 


404 


A r* * 

A-G 


1 


/i r> c" 

4bO 


r> *T* 


1 


40 / 


O- 1 


2 


4/1 


1 -L» 


1 


4oU 


1 -c 


1 


'foz 


1 -O 


1 


4oy 


1 -C 


30 


4yy 


G-A 


2 


c.nn 

ouy 


f-\ -r- 

O- 1 


2 


O I o 




3 


014 


CACA ins 


1 


Ol 4 


CA ins 


5 


CH /** 

o 14 


OA del 


37 


549 


w 1 


4 

i 


547 


A-G 


1 


548 


C-T 


1 


571 


C-T 


1 


574* 


C ins 


2 


574* 


CCC ins 


3 



574* 


CCCCCC 
ins 


3 


574 


A-G 




591 


C-A 


— - — 
1 


593 


T-C 




629 


T-C 




663* 


A-G 


— - — 
1 


678 


T-C 


1 


680 


T-C 


1 


709* 


G-A 


11 


710* 


T-C 




721 


T-C 


— 1 — 


750* 


G-A 


— 1 — 


753 


A-C 


— 1 — 


769* 


G-A 


15 


794 


T-C 


1 


825* 


T-A 


12 


827* 


A-G 




850 


T-C 


— - — 


921 


T-C 


1 


930* 


G-A 


1 


942 


A-G 




1005 


T-C 


— - — 
1 


1007 


G-A 


1 


1018* 


G-A 


15 


1041* 


A-G 


1 


1048 


C-T 


7 


1119 


T-C 


3 


1243 


T-C 


2 


1375 


C-T 


2 


1382* 


A-C 


1 


1391 


T-C 


1 


1397 


T ins 


1 
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Table 2. PCR and sequencing primers used for the analysis of mtDNA polymorphism. 



Primer 
Name* 


PCR 
Primer 


Sequene 
Primer 


T m 
(°C) 


Sequence (5 '-3') 


Dispensa- 
tion order 


SSB 
Required 


II 45 




X 


51,3 


ATG CAT TTG GTA TTT TCG TCT G 


TCGA 




II 162 




X " 


To,T 


CGC ACC TAG GTT CAA TAT TAG A 


CTGA 




II 216 

jf287B™ % 




X 


44,3 


tta atg ctt gta gg a cat aat aa 
'^Ttgttatga^ 


CTGA 




C431 
C637B 


X 
X 


X 


53,6 

53;r 


CAC CCC CCA ACT AAC ACA 
GTGATGTGAGCCC^ 


ACGT 




C2988 


X 


X 


51,8 


CGA TGT TGG ATC AGG AC A 


ACGT 






X™ 




52.4 


GG TGG GTG TGG GTA T AA 






C 3403 


X 


^ X _ 


56,8 

^56^"'"' 


CTA CGC AAA GGC CCC AA 
GCT AGG CT 


CAGT 


X 


C4156 
*C 4367B' " 


X 


X 


52,5 

™_ 


caa ctc ata cac ctc cta tga aa 
" "tto 


ACGT 




C4882 


X 


X 


49 


CCA TCT CAA TCA TAT ACC AAA 


ATGC 




C5138B 


X 




50 


GGA GTT TAA GTT GAG TAG TAG GAA 






CS665 


X 


X 


52 


CAA TGA CTA ATC AAA CTA ACC TCA 


ATGC 




C 8803 B 


X 




51,1 


TAA ATG AGT GAG GC A GGA GT 






C 12346 


X 


X 


47,9 


CAC ACT ACT ATA ACC ACC CTA A 


ACGT 


X 


C 12541 B 


X 




49,1 


CTC AGT GTC AGT TCG AGA TAA 






C 12673 


X 


X 


45,7 


AAC ATT AAT C AG TTC TTC AAA 


ACGT 


X 


C 12861 B 


X 




46,7 


GTT GTA TAG GAT TGC TTG AA 








A 


A 


J. 

J J, 4- 


ATG ACC CCA ATA CGC AAA 








X 




54,1 


TGG GCG ATT GAT GAA AAG 






C 15883 


X 


X 


50,2 


GGC CTG TCC TTG TAG TAT AAA 


ACGT 




C 16083 B 


X 




52 


GGT TGT TGA TGG GTG AGT C 






C 16496 


X 


X 


48,5 


GAC ATC TGG TTC CTA CTT CA 


ACGT 


X 


C149B 


X 




47,7 


ATG AGG CAG GAA TCA AA 






I 16105 


X 


X 


50,2 


TGC CAG CCA CCA TGA ATA 


CTGA 


X 


I 16168 




X 


45,1 


CCA ATC CAC ATC AAA ACC 


CTGA 


X 


I 16203 




X 


40,4 


AGC AAG T AC AGC AAT CAA 


CTGA 




I 16266 




X 


42,1 


CCC ACT AGG ATA CCA ACA 


CTGA 




I 16348 B 


X 




51,8 


GAC TGT AAT GTG CTA TGT ACG GTA 
AA 






*I = HVI, II- 


HVU, C = coding region, B 


— biotin labeled reverse primer. 
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CLAIMS 

1. A method for determining the geographic or population origin of a human 
comprising the following steps: 

a) determining the complete nucleic acid sequence of the mitochondrial genome 
including polymorphic sites in a sample from a human subject who's origin or 
identity is studied; and 

b) relating the information from step a) to mitochondrial nucleic acid sequence 
information of known origin. 

2. A method according to claim 1, wherein the known information in step b) is 
derived from a database of nucleic acid sequence information from humans of 
diverse origin. 

3. A method according to claims 1 or 2, wherein the mitochondrial nucleic acid 
sequence is the complete nucleic acid sequence of the mtDNA genome, excluding 
the D-loop. 

4. A method according to claims 1 or 2, wherein the mitochondrial nucleic acid 
sequence comprises the polymorphic sites mentioned in Table 1. 

5. A method according to any of the above claims, wherein the human 
mitochondrial nucleic acid sequences is determined by DNA sequencing, or in the 
case of genetic markers, on assays such as DNA hybridisation assays (ASO, SSO 
hybridisation, DNA microchip, padlock), enzymatic ligation assays (OLA, padlock) 
enzymatic cleavage assays (Taqman), enzymatic extension assays (minisequencing, 
pyrosequencing) or other assays for typing of genetic polymorphisms. 

6. A method according to claim 5, wherein the mitochondrial nucleic acid sequence 
is determined by pyrosequencing. 
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7. A kit for determining geographic or population origin of a human, comprising 
means for analysis of a selected set of the genetic markers (polymorphism) from the 
sites listed in Table 1 of the present patent application. 

8. A kit according to claim 7, wherein the selected set of genetic markers are 
outside the D-loop. 

9. A kit according to claim 7 or 8, wherein the means for analysis are selected from 
the reagents listed in Table 2 of the present patent application. 
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AMENDED CLAIMS 

[received by the International Bureau on 22 January 2002 (22.01 .02); 
original claims 1-9 replaced by amended claims 1-13 (2 pages)] 

AMENDED CLAIMS under Article 19, PCT 

1 . A method for determining the origin or identity of a human comprising 
the following steps: 

a) determining polymorphic sites in the complete nucleic acid sequence of 
the mitochondrial genome in a sample from a human subject who's origin or 
identity is studied; and 

b) relating the information from step a) to mitochondrial nucleic acid 
sequence information of known origin. 

2. A method according to claim 1, wherein the known information in step b) 
is derived from a database of nucleic acid sequence information from 
humans of diverse origin. 

3. A method according to claims 1 or 2, wherein the mitochondrial nucleic 
acid sequence is the complete nucleic acid sequence of the mtDNA genome, 
excluding the D-loop. 

4. A method according to claims 1 or 2, wherein the mitochondrial nucleic 
acid sequence comprises the polymorphic sites mentioned in Table 1 . 

5. A method according to any of the above claims, wherein the human 
mitochondrial nucleic acid sequences is determined by DNA sequencing, or 
in the case of genetic markers, on assays such as DNA hybridisation assays 
(ASO, SSO hybridisation, DNA microchip, padlock), enzymatic ligation 
assays (OLA, padlock) enzymatic cleavage assays (Taqman), enzymatic 
extension assays (minisequencing, pyrosequencing) or other assays for 
typing of genetic polymorphisms. 

6. A method according to claim 5, wherein the mitochondrial nucleic acid 
sequence is determined by pyrosequencing. 
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7. A method according to any of the above claims, wherein the means for 
analysis are selected from the reagents listed in Table 2 of the present patent 
application. 

8. A kit for determining origin or identity of a human, comprising means for 
analysis covering informative sites in the entire mitochondrial genome. 

9. A kit according to claim 8, wherein the informative sites are outside the D- 
loop. 

10. A kit according to claims 8 or 9, wherein the means for analysis are 
selected from the reagents listed in Table 2 of the present patent application. 

11. A kit according to any of the claims 8-10, wherein the means for analysis 
are amplifying primers, sequencing primers and means for detection of 
polymorphism . 

12. A kit according to any of the claims 8-11, wherein the means for analysis 
are means for DNA sequencing, or in the case of genetic markers, on assays 
such as DNA hybridisation assays (ASO, SSO hybridisation, DNA microchip, 
padlock), enzymatic ligation assays (OLA, padlock) enzymatic cleavage 
assays (Taqman), enzymatic extension assays (minisequencing, 
pyrosequencing) or other assays for typing of genetic polymorphisms. 

13. A kit according to any of the claims 8-12, wherein the means for 
analysis are pyrosequencing means. 
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